Goto

Collaborating Authors

 holden karau


Pre-Spark Summit Meetup in Dublin, Ireland

@machinelearnbot

Since the creation of Apache Spark, I/O throughput has increased at a faster pace than processing speed. In a lot of big data applications, the bottleneck is increasingly the CPU. With the release of Apache Spark 2.0 and Project Tungsten, Spark runs a number of control operations close to the metal. At the same time, there has been a surge of interest in using GPUs (the Graphics Processing Units of video cards) for general purpose applications, and a number of frameworks have been proposed to do numerical computations on GPUs. In this talk, we will discuss how to combine Apache Spark with TensorFlow, a new framework from Google that provides building blocks for Machine Learning computations on GPUs.


Extend structured streaming for Spark ML โ€“ Inside Machine learning โ€“ Medium

#artificialintelligence

To learn more about Spark's Machine Learning APIs, check out Holden Karau's and Seth Hendrickson's session Extending Spark ML at Spark Summit West 2017 on Tuesday, June 6 2:00 PM (Room 2). Spark's new ALPHA Structured Streaming API has caused a lot of excitement because it brings the Data set/DataFrame/SQL APIs into a streaming context. In this initial version of Structured Streaming, the machine learning APIs have not yet been integrated. However, this doesn't stop us from having fun exploring how to get machine learning to work with Structured Streaming. For our Spark Structured Streaming for machine learning talk on at Strata Hadoop World New York 2016, we've started early proof-of-concept work to integrate structured streaming and machine learning available in the spark-structured-streaming-ml repo.


Learning Spark: Lightning-Fast Big Data Analysis: Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia: 9781449358624: Amazon.com: Books

@machinelearnbot

Holden Karau is transgender Canadian, and anactive open source contributor. When not in San Francisco working as asoftware development engineer at IBM's Spark Technology Center, Holdentalks internationally on Spark and holds office hours at coffee shops athome and abroad. She makes frequent contributions to Spark, specializing inPySpark and Machine Learning. Prior to IBM she worked on a variety ofdistributed, search, and classification problems at Alpine, Databricks,Google, Foursquare, and Amazon. She graduated from the University ofWaterloo with a Bachelor of Mathematics in Computer Science.


High Performance Spark: Best practices for scaling and optimizing Apache Spark: Holden Karau, Rachel Warren: 9781491943205: Amazon.com: Books

@machinelearnbot

Holden Karau is transgender Canadian, and an active open source contributor. When not in San Francisco working as a software development engineer at IBM's Spark Technology Center, Holden talks internationally on Spark and holds office hours at coffee shops at home and abroad. She makes frequent contributions to Spark, specializing in PySpark and Machine Learning. Prior to IBM she worked on a variety of distributed, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelor of Mathematics in Computer Science.


Extend structured streaming for Spark ML

#artificialintelligence

To learn more about Structured Streaming and Machine Learning, check out Holden Karau's and Seth Hendrickson's session Spark Structured Streaming for machine learning at Strata Hadoop World New York, September 26-29, 2016. Spark's new ALPHA Structured Streaming API has caused a lot of excitement because it brings the Data set/DataFrame/SQL APIs into a streaming context. In this initial version of Structured Streaming, the machine learning APIs have not yet been integrated. However, this doesn't stop us from having fun exploring how to get machine learning to work with Structured Streaming. For our Spark Structured Streaming for machine learning talk on at Strata Hadoop World New York 2016, we've started early proof-of-concept work to integrate structured streaming and machine learning available in the spark-structured-streaming-ml repo.


Who will be speaking at Data Day Texas?

#artificialintelligence

We had a pretty incredible line-up for Data Day Texas 2016 -- and we intend to exceed your expectations again for 2017. Tell us whom you want to see, what topics you want to learn about, and let us make it happen. Please share your thoughts at suggestions@datadaytexas.com. If you wish to propose a talk or workshop, please visit the Data Day Proposals page. Her commercial applications of data science include developing predictive maintenance models for oil and gas pipelines at Deep Signal, and designing/building a platform for real-time model application, data storage, and model building at WibiData.